Method and apparatus for generating a database of road sign images and positions

ABSTRACT

The present invention relates to an apparatus for rapidly analyzing frame(s) of digitized video data which may include objects of interest randomly distributed throughout the video data and wherein said objects are susceptible to detection, classification, and ultimately identification by filtering said video data for certain differentiable characteristics of said objects. The present invention may be practiced on pre-existing sequences of image data or may be integrated into an imaging device for real time, dynamic, object identification, classification, logging/counting, cataloging, retention (with links to stored bitmaps of said object), retrieval, and the like. The present invention readily lends itself to the problem of automatic and semi-automatic cataloging of vast numbers of objects such as traffic control signs and utility poles disposed in myriad settings. When used in conjunction with navigational or positional inputs, such as GPS, an output from the inventive system indicates the identity of each object, calculates object location, classifies each object by type, extracts legible text appearing on a surface of the object (if any), and stores a visual representation of the object in a form dictated by the end user/operator of the system. The output lends itself to examination and extraction of scene detail which cannot practically be successfully accomplished with just human viewers operating video equipment, although human intervention can still be used to help judge and confirm a variety of classifications of certain instances and for types of identified objects.

FIELD OF THE INVENTION

[0001] The present invention relates generally to the field of automatedimage identification. In particular, identification of objects depictedin one ore more image frames of segment of video. The present inventionteaches methods for rapidly scrutinizing digitized image frames andclassifying and cataloging objects of interest depicted in the videosegment by filtering said image frames for various differentiablecharacteristics of said objects and extracting relevant data about saidobjects while ignoring other features of each image frame.

BACKGROUND OF THE INVENTION

[0002] Prior art devices described in the relevant patent literature forcapturing one or more objects in a scene typically include a cameradevice of known location or trajectory, a scene including one or morecalibrated target objects, and at least one object of interest (see U.S.Pat. No. 5,699,444 to Sythonics Incorporated). Most prior art devicesare used for capture of video data regarding an object operate in acontrolled setting, oftentimes in studios or sound stages, and arearticulated along a known or preselected path (circular or linear).Thus, the information recorded by the device can be more easilyinterpreted and displayed given the strong correlation between theperspective of the camera and the known objects in the scene.

[0003] To capture data regarding objects present in a scene a number oftechniques have been successfully practiced. For example, U.S. Pat. No.5,633,944 entitled “Method and Apparatus for Automatic OpticalRecognition of Road Signs” issued May 27, 1997 to Guibert et al. andassigned to Automobiles Peugeot discloses a systems wherein a laserbeam, or other source of coherent radiation, is used to scan theroadside in an attempt to recognize the presence of signs.

[0004] Additionally, U.S. Pat. No. 5,790,691 entitled “Method andApparatus for Robust Shape Detection Using a Hit/Miss Transform” issuedAug. 4, 1998 to Narayanswamy et al. and assigned to the Regents of theUniversity of Colorado (Boulder, Colo.) discloses a system for detectingabnormal cells in a cervical Pap-smear. In this system a detection unitinspects a region of interest present in two dimensional input imagesand morphologically detects structure elements preset by a system user.By further including a thresholding feature the shapes and/or featuresrecorded in the input images can deviate from structuring elements andstill be detected as a region of interest. This reference clearly usesextremely controlled conditions, known presence of objects of interest,and continually fine-tuned filtering techniques to achieve reasonableperformance. Similarly, U.S. Pat. No. 5,627,915 entitled “PatternRecognition System Employing Unlike Templates to Detect Objects HavingDistinctive Features in a Video Field” issued May 6, 1997 to Rosser etal. and assigned to Princeton Video Image, Inc. of Princeton, N.J.discloses a method for rapidly and efficiently identifying landmarks andobjects using a plurality of templates that are sequentially created andinserted into live video fields and compared to a prior template(s) inorder to successively identify possible distinctive feature candidatesof a live video scene and also eliminate falsely identified features.The process disclosed by Rosser et al. is repeated in order topreliminarily identify two or three landmarks of the target object thelocations of these “landmarks” of the target object and finally saidlandmarks are compared to a geometric model to further verify if theobject has been correctly identified by process of elimination. Themethodology lends itself to laboratory verification against pre-recordedvideotape to ascertain accuracy before applying said system to actualtargeting of said live objects. This system also requires specifictemplates of real world features and does not operate on unknown videodata with its inherent variability of lighting, scene composition,weather effects, and placement variation from said templates to actualconditions in the field.

[0005] Further prior art includes U.S. Pat. No. 5,465,308 entitled“Pattern Recognition System” issued Nov. 7, 1995 to Hutcheson et al. andassigned to Datron/Transoc, Inc. of Simi Valley, Calif. discloses amethod and apparatus under software control that uses a neural networkto recognize two dimensional input images which are sufficiently similarto a database of previously stored two dimensional images. The imagesare processed and subjected to a Fourier transform (which yields a powerspectrum and then a in-class/out-of-class sort is performed). A featurevector consisting of the most discriminatory magnitude information fromthe power spectrum is then created and are input to a neural networkpreferably having two hidden layers, input dimensionality of elements ofthe feature vector and output dimensionality of the number of dataelements stored in the database. Unique identifier numbers arepreferably stored along with the feature vector. Applying a queryfeature vector to the neural network results in an output vector whichis subjected to statistical analysis to determine whether a thresholdlevel of confidence exists before indicating successful identificationhas occurred. Where a successful identification has occurred a uniqueidentifier number for the identified object may be displayed to the enduser to indicate. However, Fourier transforms are subject to largevariations in frequency such as those brought on by shading, or othertemporary or partial obscuring of objects, from things like leaves andbranches from nearby trees, scratches, bullet holes (especially if usedfor recognizing road signs), commercial signage, windshields, and otherreflecting surfaces (e.g., windows) all have very similarcharacteristics to road signs in the frequency domain.

[0006] In summary, the inventors have found that in the prior artrelated to the problem of accurately identifying and classifying objectsappearing in a videodata most all efforts utilize complex processing,illuminated scenes, continual tuning of a single filter and/orsystematic comparison of aspects of an unknown object with a variety ofshapes stored in memory. The inventors propose a system that efficientlyand accurately retrieves and catalogs information distilled from vastamounts of videodata so that object classification type(s), locations,and bitmaps depicting the actual condition of the objects (whenoriginally recorded) are available to an operator for review,comparison, or further processing to reveal even more detail about eachobject and relationships among objects.

[0007] The present invention thus finds utility over this variety ofprior art methods and devices and solves a long-standing need in the artfor a simple apparatus for quickly and accurately recognizing,classifying, and locating each of a variety of objects of interestappearing in a videostream. Determining that an object is the “same”object from a distinct image frame.

[0008] The present invention addresses an urgent need for virtuallyautomatic processing of vast amounts of video data—that possibly depictone or more desired objects—and then precisely recognize, accuratelylocate, extract desired characteristics, and, optionally, archive bitmapimages of each said recognized object. Processing such video informationvia computer is preferred over all other forms of data interrogation,and the inventors suggest that such processing can accurately andefficiently complete a task such as identifying and cataloguing hugenumbers of objects of interest to many public works departments andutilities; namely, traffic signs, traffic lights, man holes, power polesand the like disposed in urban, suburban, residential, and commercialsettings among various types of natural terrain and changing lightingconditions (i.e., the sun).

SUMMARY OF THE INVENTION

[0009] The exemplary embodiment described, enabled, and taught herein isdirected to the task of building a database of road signs by type,location, orientation, and condition by processing vast amounts of videoimage frame data. The image frame data depict roadside scenes asrecorded from a vehicle navigating said road. By utilizingdifferentiable characteristics the portions of the image frame thatdepict a road sign are stored as highly compressed bitmapped files eachlinked to a discrete data structure containing one or more of thefollowing memory fields: sign type, relative or absolute location ofeach sign, reference value for the recording camera, reference value fororiginal recorded frame number for the bitmap of each recognized sign.The location data is derived from at least two depictions of a singlesign using techniques of triangulation, correlation, or estimation.Thus, output signal sets resulting from application of the presentmethod to a segment of image frames can include a compendium of dataabout each sign and bitmap records of each sign as recorded by a camera.Thus, records are created for image-portions that possess (and exhibit)detectable unique differentiable characteristics versus the majority ofother image-portions of a digitized image frame. In the exemplarysign-finding embodiment herein these differentiable characteristics arecoined “sign-ness.” Thus, based on said differentiable characteristics,or sign-ness, information regarding the type, classification, condition(linked bitmap image portion) and/or location of road signs (andimage-portions depicting said road signs) are rapidly extracted fromimage frames. Those image frames that do not contain an appreciablelevel of sign-ness are immediately discarded.

[0010] Differentiable characteristics of said objects includeconvexity/symmetry, lack of 3D volume, number of sides, angles formed atcorners of signs, luminescence or lumina values, which representillumination tolerant response in the L*u*v* or LCH color spaces(typically following a transforming step from a first color space likeRGB); relationship of edges extracted from portions of image frames,shape, texture, and/or other differentiable characteristics of one ormore objects of interest versus background objects. The differentiablecharacteristics are preferably tuned with respect to the recordingdevice and actual or anticipated recording conditions are taught morefully hereinbelow.

[0011] The method and apparatus of the present invention rapidlyidentifies, locates, and stores images of objects depicted in digitizedimage frames based upon one or more differentiable characteristic of theobjects (e.g., versus non-objects and other detected background noise).The present invention may be implemented in a single microprocessorapparatus, within a single computer having multiple processors, amongseveral locally-networked processors (i.e., an intranet), or via aglobal network of processors (i.e., the internet and similar). Portionsof individual image frames exhibiting an appreciable level ofpre-selected differentiable characteristics of desired objects areextracted from a sequence of video data and said portions of theindividual frames (and correlating data thereto) are used to confirmthat a set of several “images” in fact represent a single “object” of aclass of objects. These preselected differentiable characteristiccriteria are chosen from among a wide variety of detectablecharacteristics including color characteristics (color-pairs and colorset memberships), edge characteristics, symmetry, convexivity, lack of3D volume, number and orientation of side edges, characteristic cornerangles, frequency, and texture characteristics displayed by the2-dimensional (2D) images so that said objects can be rapidly andaccurately recognized. Preferably, the differentiable characteristicsare chosen with regard to anticipated camera direction relative toanticipated object orientation so that needless processing overhead isavoided in attempting to extract features and characteristics likely notpresent in a given image frame set from a known camera orientation.Similarly, in the event that a scanning recording device, or devices,are utilized to record objects populating a landscape, area, or otherspace the extraction devices can be preferably applied only to thoseframes that likely will exhibit appreciable levels of an extractedfeature or characteristic.

[0012] In a preferred embodiment of the inventive system taught herein,is applied to image frames and unless at least one output signal from anextraction filter preselected to capture or highlight a differentiablecharacteristic of an object of interest exceeds a threshold value thethen-present image frame is discarded. For those image frames notdiscarded, an output signal set of location, type, condition, andclassification of each identified sign is produced and linked to atleast one bitmap image of said sign. The output signal set and bitmaprecord(s) are thus available for later scrutiny, evaluation, processing,and archiving. Of course, prefiltering or conditioning the image framesmay increase the viability of practicing the present invention. Someexamples include color calibration, color density considerations, videofiltering during image capture, etc.

[0013] In a general embodiment of the present invention, differentiablecharacteristics present in just two (2) images of a given object areused to confirm that the images in fact represent a single objectwithout any further information regarding the location, direction, orfocal length of an image acquisition apparatus (e.g., digital camera)that recorded the initial at least two image frames. However, if thelocation of the digital camera or vehicle conveying said digital camera(and the actual size of the object to be found) are known, just a single(1) image of an object provides all the data required to recognize andlocate the object.

[0014] The present invention has been developed to identify trafficcontrol, warning, and informational signs, “road signs” herein, thatappear adjacent to a vehicle right-of-way, are visible from said rightof way, and are not obscured by non-signs. These road signs typicallyfollow certain rules and regulations relative to size, shape, color (andallowed color combinations), placement relative to vehicle pathways(orthogonal), and sequencing relative to other classes of road signs.While the term “road sign” is used throughout this written descriptionof the present invention, a person of ordinary skill in the art to whichthe invention is directed will certainly realize applications of thepresent invention to other similar types of object recognition. Forexample, the present invention may be used to recognize, catalogue, andorganize searchable data relative to signs adjacent a rail road right ofway, nature trailways, recreational vehicle paths, commercial signage,utility poles, pipelines, billboards, man holes, and other objects ofinterest that are amenable to video capture techniques and thatinherently possess differentiable characteristics relative to theirlocal environment. Of course, the present invention may be practicedwith imaging systems ranging from monochromatic visible wavelengthcamera/film combinations to full color spectrum visible wavelengthcamera/memory combinations to ultraviolet, near infrared, or infraredimaging systems, so long as basic criteria are present: objectdifferentiability from its immediate milieu or range data.

[0015] Thus, the present invention transforms frames of digital videodepicting roadside scenes using a set of filters that are logicallycombined together with OR gates or combined algorithmically and eachoutput is equally weighted, and that each operate quickly to capture adifferentiable characteristic of one or more road sign of interest.Frequency and spatial domain transformation, edge domain transformation(Hough space), color transformation typically from a 24 bit RGB colorspace to either a L*u*v* or LCH color space (using either fuzzy colorset tuning or neural network tuning for objects displaying adifferentiable color set), in addition to use of morphology(erosion/dilation), and a moment calculation applied to a previouslysegmented image frame is used to determine whether an area of interestthat contains an object is actually a road sign. The aspect ratio andsize of a potential object of interest (an “image” herein) can be usedto confirm that an object is very likely a road sign. If none of thefilters produces an output signal greater than a noise level signal,that particular image frame is immediately discarded. The inventors notethat in their experience, if the recording device is operating in anurban setting with a recording vehicle operating at normal urban drivingspeeds and the recording device has a standard frame rate (e.g., thirtyframes per second) only about twelve (12) frames per thousand (1.2%)have images, or portions of image frames, that potentially correlate toa single road sign of sufficiently detectable size. Typically only four(4) frames per thousand actually contain an object of interest, or roadsign in the exemplary embodiment. Thus, a practical requirement for asuccessful object recognition method is the ability to rapidly cull theninety-eight percent (98%) of frames that do not assist the objectrecognition process. In reality, more image frames contain some visiblecue as to the presence of a sign in the image frame, but the amount ofdifferentiable data is typically recorded by the best eight (8) of soimages of each potential object of interest. The image frames aretypically coded to correspond to a camera number (if multiple camerasare used) and camera location data (i.e., absolute location via GPS orinertial coordinates if INS is coupled to the camera of camera-carryingvehicle). If the location data comprises a time/position databasedirectly related to frame number (and camera information in amulti-camera imaging system) extremely precise location information ispreferably derived using triangulation of at least two of the related“images” of a confirmed object (road sign).

[0016] The present invention successfully handles partially obscuredsigns, skewed signs, poorly illuminated signs, signs only partiallypresent in an image frame, bent signs, and ignores all other informationpresent in the stream of digital frame data (preferably even the poststhat support the signs). One of skill in the art will quickly recognizethat the exemplary system described herein with respect to trafficcontrol road signs is readily adaptable to other similar identificationof a large variety of man-made structures. For example, cataloging thelocation, direction the camera is facing, condition, orientation andother attributes of objects such as power poles, telephone poles,roadways, railways, and even landmarks to assist navigation of vehiclescan be successfully completed by implementing the inventive methoddescribed herein upon a series of images of said objects. In a generalembodiment, the present invention can quickly and accurately distillarbitrary/artificial objects disposed in natural settings and except forconfirming at least one characteristic of the object (e.g., color,linear shape, aspect ratio, etc.), the invention operates successfullywithout benefit of pre-existing knowledge about the full shape, actualcondition, or precise color of the actual object.

[0017] The present invention is best illustrated with reference to oneor more preferred embodiments wherein a series of image frames (eachcontaining a digital image of at least a portion of an object ofinterest) are received, at least two filters (or segmentationalgorithms) applied, spectral data of the scene scrutinized so thatthose discrete images that exceed at least one threshold of one filterduring extraction processing become the subject of more focusedfiltering over an area defined by the periphery of the image. Theperiphery area of the image is found by applying common region growingand merging techniques to grow common-color areas appearing within anobject. The fuzzy logic color filter screens for the color presence andmay be implemented as neural network. In either event, an image areaexhibiting a peak value representative of a color set which stronglycorrelates to a road sign of interest is typically maintained forfurther processing. If and only if the color segmentation routine fails,a routine to determine the strength of the color pair output is thenapplied to each image frame that positively indicated presence of acolor pair above the threshold noise level. Then further segmentation isdone possibly using color, edges, adaptive thresholding, color frequencysignatures, or moment calculations. Preferably the image frame issegmented into an arbitrary number of rectangular elements (e.g,. 32 or64 segments). The area where the color pair was detected is preferablygrown to include adjacent image segments that also exhibit anappreciable color-pair signal in equal numbered segments. This slightexpansion of a search space during the moment routine does notappreciably reduce system throughput in view of the additionalconfirming data derived by expanding the space. Morphology techniquesare then preferably used to grow and erode the area defined by themoment routine-segmented space until either the grown representationmeets or fails to meet uniform criteria during the dilation and erosionof the now segmented image portion of the potential object (“image”). Ifthe image area meets the morphological criteria a final image peripheryis calculated. Preferably this final image periphery includes less thanthe maximum, final grown image so that potential sources of error, suchas non-uniform edges, and other potentially complex pixel data areavoided and the final grown representation of the image essentiallyincludes only the actual colored “face” of the road sign. A second ordercalculation can be completed using the basic segmented moment spacewhich determines the “texture” of the imaged area—although the inventorsof the present invention typically do not routinely sample for texture.

[0018] The face of the road sign can be either the colored front portionof a road sign or the typically unpainted back portion of a road sign(if not obscured by a sign mounting surface). For certain classes ofroad signs, only the outline of the sign is all that is needed toaccurately recognize the sign. One such class is the ubiquitouseight-sided stop sign. A “bounding box” is defined herein as a polygonwhich follows the principal axis of the object. Thus, rotation, skew ora camera or a sign, and bent signs are not difficult to identify. Theprincipal axis is a line through the center of mass and at least oneedge having a minimum distance to all pixels of the object. In this waya bounding box will follow the outline of a sign without capturingnon-sign image portions.

[0019] Then, the aspect ratio of the finally grown image segments iscalculated and compared against a threshold aspect ratio set (three areused herein, each corresponding to one or more classes of road signs)and if the value falls within preset limits, or meets other criteriasuch as a percentage of color (# of pixels), moments, number of corners,corner angles, etc., the threshold the image portion (road sign face) issaved in a descending ordered listing of all road signs of the same type(where the descending order corresponds to the magnitude or strength ofother depictions of possible road signs). For a class of road signswhere the sign only appears in as a partial sign image the inventors donot need special processing since only three intersecting edges(extracted via a Hough space transformation) grown together if necessaryin addition to color-set data is required to recognize most everyvariety of road sign. The aspect ratio referred to above can be one ofat least three types of bounding shape: a rectangular (or polygon)shape, an ellipse-type shape, or a shape that is mathematically relatedto circularity-type shape. For less than four-sided signs therectangular polygon shapes are used and for more than four sides theellipse-type shapes are used.

[0020] The frame buffer is typically generated by a digital imagecapture device. However, the present invention may be practiced in asystem directly coupled to a digital image capture apparatus that isrecording live images, or a pre-recorded set of images, or a series ofstill images, or a digitized version of an original analog imagesequence. Thus, the present invention may be practiced in real time,near real time, or long after initial image acquisition. If the initialimage acquisition is analog, it must be first digitized prior tosubjecting the image frames to analysis in accordance with the inventionherein described, taught, enabled, and claimed. Also a monitor can becoupled to the processing equipment used to implement the presentinvention so that manual intervention and/or verification can be used toincrease the accuracy of the ultimate output, a synchronized database ofcharacteristic type(s), location(s), number(s), damaged and/or missingobjects.

[0021] Thus the present invention creates at least a single output foreach instance where an object of interest was identified. Furtherembodiments include an output comprising one or more of the following:orientation of the road sign image, location of each identified object,type of object located, entry of object data into an Intergraph GISdatabase, and bitmap image(s) of each said object available for humaninspection (printed and/or displayed on a monitor), and/or archived,distributed, or subjected to further automatic or manual processing.

[0022] Given the case of identifying every traffic control sign in acertain jurisdiction, the present invention is applied to scrutinizestandard videostream of all roadside scenes present in saidjurisdiction. Most jurisdictions authorize road signs to be painted orfabricated only with specific discrete color-pairs, and in some casescolor-sets (e.g., typically having between one and four colors) for useas traffic control signage. The present invention exploits this featurein an exemplary embodiment wherein a these discrete color-sets form adifferentiable criteria. Furthermore, in this embodiment a neuralnetwork is rapidly and efficiently trained to recognize regions in theimage frames that contain these color-sets. Examples of said color setspresently useful in recognizing road signs in the U.S. include:red/white, white/black/red, green/white/blue, among several otherseasily cognizable by those of skill in the art.

[0023] Of course, certain characteristic colors themselves can assistthe recognition of road signs from a scene. For example, a shade ofyellow depicts road hazard warnings and advisories, white signs indicatespeed and permitted lane change maneuver data, red signs indicateprohibited traffic activity, etc. Furthermore, since only a single fontis approved for on-sign text messages in the U.S. character recognitiontechniques (e.g., OCR) can be applied to ensure accurate identificationof traffic control signage as the objects of interest in a videostream.Therefore a neural network as taught herein is trained only on a fewsets of image data including those visual characteristics of objects ofinterest such as color, reflectance, fluorescence, shape, and locationwith respect to a vehicle right of way operates to accurately identifythe scenes in an economical and rapid manner. In addition, known lineextracting algorithms, line completion, or “growing,” routines, andreadily available morphology techniques may be used to enhance therecognition processing without adding significant additional processingoverhead.

[0024] In a general application of the present invention, a conclusionmay be drawn regarding whether object(s) appearing in a sequence ofvideo data are fabricated by humans or naturally generated by other thanmanual processing. In this class of applications the present inventioncan be applied to enhance the success of search and rescue missionswhere personnel and vehicles (or portions of vehicles) may be randomlydistributed throughout a large area of “natural materials”. Likewise,the method taught in the present disclosure finds application inundersea, terrestrial, and extra-terrestrial investigations whereincertain “structured” foreign (artificial or man-made) materials arepresent in a scene of interest might only occur very infrequently over avery large sample of videostream (or similar) data. The presentinvention operates as an efficient graphic-based search engine too. Thetask of identifying and locating specific objects in huge amounts ofvideo data such as searching for missile silos, tanks, or otherpotential threats depicted in images captured from remote sensingsatellites or air vehicles readily benefits from the automated imageprocessing techniques taught, enabled, and disclosed herein.

[0025] A person of skill in the art will of course recognize myriadapplications of the invention taught herein beyond the repetitive objectidentification, fabricated materials identification, and navigationexamples recited above. These and other embodiments of the presentinvention shall be further described herein with reference to thedrawings appended hereto.

[0026] The following figures are not drawn to scale and only detail afew representative embodiments of the present invention, moreembodiments and equivalents of the representative embodiments depictedherein are easily ascertainable by persons of skill in the art.

DESCRIPTION OF THE DRAWINGS

[0027]FIG. 1 depicts an embodiment of the present invention illustratedas a block diagram wherein video image frame segments feed into a set ofat least two extraction filters which have outputs that are logically“OR'd”, each non useful image frame is discarded and regions of usefulimage frames inspected, the regions satisfying sign criteria classified,saved original frame number, and, if desired a correlated sign listlinked to camera, frame number, location, or orientation is produced andlinked to at least one actual bitmapped image frame portion depictingthe sign.

[0028]FIGS. 2A, 2B, and 2C depict a portion of a image frame whereinparts of the edges of a potential object are obscured (in ghost), orotherwise unavailable, in an image frame (2A), and the same image frameportion undergoing edge extraction and line completion (2B), and thefinal enhanced features of the potential object (2C).

[0029]FIG. 3A depicts a plan view of a propelled image acquisitionvehicle system and FIG. 3B depicts a vehicle having multiple weatherhardened camera ports for recording features adjacent a vehicleright-of-way (each side, above, on the surface of the right-of-way, anda rearward view of the recording path).

[0030]FIG. 4 depicts a processing system for classifying road signsappearing in image data from multiple imaging capture devices whereincapture devices SYS1 through SYS4 utilize unique recognition filterspecifically developed for each said capture device (focal/optics,recording orientation, and camera/vehicle location specific for eachimaging system).

[0031]FIG. 5 depicts a plan view of a preferred camera arrangement foruse in practicing the present invention wherein two image capturedevices record road signs are directed in the direction of travel of thevehicle.

[0032]FIG. 6 is an enlarged view of a portion of a typical road signdepicting a border region, an interior portion of solid color, and theoutline border appearing thereon.

[0033] FIGS. 7A-F depicts the general outline and shape of sixrelatively common road signs.

DESCRIPTION OF PREFERRED EMBODIMENT

[0034] The present invention is first described primarily with referenceFIG. 1 wherein an image frame 11 which has captured a portion of a roadside scene which basically is the same as a field of view 11 of camera10 from the scene conveyed via optics 12 to a focal plane of cameraimaging means 10 which preferably includes suitable digital imagingelectronics as is known an used in the art. The scene depicted in frame11 (or subsequent frames 22, 33, 44, etc.) of FIG. 4B can containseveral objects (A, B, C, D) of interest disposed therein. In oneembodiment of the present invention, a single imaging means 10 isdirected toward the road side from the vehicle 46 as the vehiclenavigates normal traffic lanes of a roadway. The imaging means 10 oftencomprises several imaging devices 20,30,40 wherein each possiblyoverlaps other camera(s) and is directed toward a slightly differentfield of view 22,33,44, respectively (see FIG. 4B) than the otherimaging devices comprising imaging means 10 at objects A-D, etc. withsufficient clarity upon the suitable digital imaging electronics ofimaging means 10 to derive chromatic and edge details from saidelectronics. The imaging means 10 can be multiple image means having avariety of optical properties (e.g., focal lengths, aperture settings,frame capture rate) tuned to capture preselected portions of a scene ofinterest. When multiple image means 10 are used to capture image frameseach said image means 10 is electronically coupled to the processingsystem of the present invention and each is tuned with its own uniqueprocessing method(s) to optimize the quality/accuracy of the outputstherefrom so that all frame data not related to “images” of potentialobjects are filtered and then “images” of said objects compared in an“object search space” are compared so that all qualified images thatcorrespond to a single object can be linked to said single objectregardless which discrete imaging means 10 originally recorded theimage(s) of the object. In this embodiment, a dedicated CPU for eachimaging means 10 is provided to speed processing toward “real time”processing rates. Furthermore, said dedicated CPU could be provided froma single box CPU having many separate CPUs disposed therein, a networkedgroup of linked CPU's, or a global network of linked CPU's (e.g., worldwide web or internet-type network).

[0035] Typically, imaging means 10,20,20,40 are tuned so thatapproximately between five and forty percent (5-40%) of the availabletwo dimensional image frame space are captured per single object whensaid single object is “fully depicted” in a given frame. If an object ofknown size thus fills a field of view of an imaging means 10, a roughestimate of actual distance from the camera may be calculated (and thisdata can be used if needed to assist the process of accurately findingthe actual position of an recognized object in a scene).

[0036] The present invention operates sufficiently well under ambientlighting conditions when the imaging means 10 captures radiation fromthe visible spectrum. Although scene illumination may augmented with asource of illumination directed toward the scene of interest in order todiminish the effect of poor illumination and illumination variabilityamong images of objects. However, the present invention is not dependentupon said additional source of illumination but if one is used thesource of illumination should be chosen to elicit a maximum visualresponse from a surface of objects of interest. For example, source ofillumination could be a high-intensity halogen bulb designed to create amaximum reflected signal from a surface of object and wherein object isa class of traffic control signs. In this way, at least one objectpresent in a scene likely distinctly appears in a portion of two or moreframes. Then a variety of logically OR'd extraction routines and filtersextract image portions that exhibit said differentiable characteristics(which may be a slightly different set of characteristics than would beused for non-illuminated recording. As in the other embodiments, thevideo data stream is preferably linked to data for each imaging device(e.g., absolute position via GPS or d-GPS transponder/receiver, orrelative position via INS systems, or a combination of GPS and INSsystems, etc.) so the location of each identified object is known or atleast susceptible to accurate calculation.

[0037] In one manner of practicing the invention, location data issynchronized to the video data from the imaging means 10 so thatlocation and image information are cross-referenced to correlate thelocation of the object using known techniques of triangulation andassuming a set of known camera parameters. As described further herein,triangulation may be replaced or augmented if the camera recordingperspective angle is a known quantity relative to the vehicle recordingpath and the vehicle location are known (an by applying known cameraparameter values, such as focal length). Furthermore, if the pixelheight or aspect ratio (herein used to describe area of coveragemeasures) of confirmed objects are known, the location of the object canbe deduced and recorded. Thus, this data is synchronized so that eachimage frame may be processed or reviewed in the context of the recordingcamera which originally captured the image, the frame number from whicha bitmapped portion was captured, and the location of the vehicle (orexact location of each camera conveyed by the vehicle) may be quicklyretrieved.

[0038] A location matrix corresponding to the location of a confirmedobject may be built from the output data sets of the present invention.At several points in the processing of the image frames, manualinspection, interaction, and/or intervention may be sought to furtherconfirm the accuracy of the present invention as to the presence orabsence of a potential object therein. Thus, an additional output may bestored or immediately sent to a human user which includes each“questionable” identification of an object wherein each saidquestionable identification event may be quickly, although manually,reviewed with reference to this data (and a simple “confirm” or “fail”flag set by a human user).

[0039] The preferred rate of video capture for digital moving camerasused in conjunction with the present invention is thirty (30) frames persecond although still photos and faster or substantially slower imagecapture rates can be successfully used in conjunction with the presentinvention particularly if the velocity of the recording vehicle can beadapted for capture rates optimized for the recording apparatus. A highimage capture rate creates latitude for later sampling techniques whichdiscard large percentages of said frames in order to find a preselectedlevel of distinguishing features among the images within the frames thatare not discarded.

[0040] Road side objects frequently are partially obscured from theroadway by other vehicles and/or roadside features such as trees,signage, hedges, etc. High frame rates enable the present system toignore these more difficult scenes (and corresponding image frames withlittle downside. Filtering may be done here to correct for known camerairregularities such as lens distortion, color gamut recordingdeficiencies, lens scratches, etc. These may be determined by recordinga known camera target (real objects, not just calibration plates).Because the imaging vehicle is moving their motion causes a certaindegree of blurring of many objects in many frames. A sharpening filterwhich seeks to preserve edges is preferably used to overcome this oftenencountered vehicle-induced recording error. Although this filter maybenefit from, but does not require, a priori knowledge of the motionflow of pixels which will remain fairly constant in both direction andmagnitude in the case of a vehicle-based recording platform.

[0041] The frame buffer 44 is preferably capable of storing 24 bit colorrepresentative of the object 40 represented in an RGB color space andthe number of significant color bits should be five (5) or greater. Theframe buffer 44 is subjected to an edge detector utility 55 as known inthe art (and which can be directly coded as assembly language code as asimple mathematical function), such as the Sobel extractor. Theinventors note that the convolving filters used herewith (and in factthe entire class of convolving filters) may be simply coded in assemblylanguage and benefit greatly from SIMD instructions such as MMX as usedin the Pentium II computer processors of Intel Corporation, of SantaClara, Calif., U.S.A., which speeds processing and eliminates a marginof processing overhead. The frame buffer is separated into two channelsof data, a first data set of edge data and a second data set of colordata. As earlier mentioned only a small subset of high-reflectancecolors are typically authorized for use as road sign colors, andfurthermore, the set of colors authorized can be generally characterizedas non-typical colors (i.e., occurring only in conjunction with objectsof interest).

[0042] Information about a series of at least two (2) images indifferent image frames is needed (prior to the images to be “combined”into a single confirmed object) and the information about each confirmedobject is preferably saved in a parametric data format (i.e., asscaleable data).

[0043] Either a thresholding routine, a fuzzy color set, or a neuralnetwork can be used to the extract relevant color-set data. The effectis simply to alter the range of colors that will successfully activate aflag or marker related to the color data set so that small variations incolor of the sign (due to different illumination of images of the sameobject, UV exposure, different colorants, different manufacturing datesfor the colorant, etc.) do not tend to create erroneous results.Accordingly, thresholding red to trip just when stop sign-red isdetected in combination with the rule set of relative location ofdifferent types of signs helps eliminate pseudo-signs (something thatlooks something like a sign of interest, but isn't). In the event that aportion of a sign is obscured (either by another sign, or by unrelatedobjects) just two (2) opposing corners for four-sided signs, and three(3) corners that do not share a common edge for six and eight-sidedsigns (as exhibited by two intersecting edges which meet at a set ofdetectable, distinctive characteristic angles) is typically required toidentify whether an appropriate edge of a real sign has beenencountered. A special aspect of signs exploited by the presentinvention is that most road signs have a thin, bold strip aroundsubstantially the entire periphery of the face of the sign. This boldperiphery strip is often interrupted where small sign indicia aretypically printed. Thus the characteristic striping operates as a veryuseful feature when reliably detected as is possible with the presentinvention and in practical terms this border offers two (2)opportunities to capture an edge set having the proper spatial andangular relationships of an object thereby increasing the likelihoodthat a sign having a typical border will be accurately and rapidlyrecognized by the present inventive system.

[0044] Then, if the image illumination is sufficient for color detectionthe type of road sign can be determined by filtering the color data setwith the inventive hysteresis filter described herein. This allowdetection of signs appearing adjacent to red stop signs that mightotherwise appear as another color to the camera (and perhaps to a cameraoperator). Because in the U.S. informational signs are typically whiteor blue, directional and jurisdictional signs are typically green, andcaution signs are typically yellow, which all produce relatively subtlediscontinuities compared to red stop signs, detecting the subtletiesamong the former presents a difficulty economically solved by thepresent invention. In conjunction with the color data set, and given anassumption that the videostream depicting the road side signage wascaptured by a vehicle navigating in a normal traffic lane, the locationof a road sign (in a temporal and literal sense) in successive frameshelps indicate precisely the type of sign encountered. Further, theinventive system herein described further takes advantage of the limitedfonts used for text appearing on road signs as well as the limited typesof graphical icons depicted on certain signs. This type of sign indiciacan be put into a normalized orientation and simple OCR ortemplate-matching techniques readily and successfully applied. Thesetechniques work especially well in cooperation with the presentinvention because the segmentation and normalization routines haveremoved non-sign background features and the size and position of thesign indicia are not variant. With respect to road signs painted on thesurface of a road the color, message, shape, sequence, and locationrelative to a typical vehicle allow rapid and accurate identificationusing the present invention. In particular, use of a text segmentingroutine practically causes the entire road to fail to record ameaningful value and the “sign” on the road becomes readily apparent(e.g., stripes, lines, messages, arrows, etc.).

[0045] Once an image (portion of an image frame) has been created andstored in the image list database then the area of the sign is marked inthe frame. This marked region is the perimeter eroded at least one fullpixel. This area is not considered to be part of any other sign. Thescene is then reprocessed after having re-initializing all the adaptiveparameters and hysteresis filters, surround inputs are changed also onthe nth pass from the N−1 pass. For example, after an image portiondepicting a stop sign is marked and essentially removed from the imageframe during later re-processing of the image frame, the pixelscorresponding to said marked region are set to a null value. This aidslater processing techniques that compare a number of adjacent pixels inorder to identify boundaries of signs. Thus a potential source of bias;namely, prior pixel values from the originally recorded image from areremoved during later processing and to the extent that the values of aset of pixels in said removed area are needed for boundary or edgedetection. This single hysteresis filter therefore is highly adaptableand useful in practicing the present invention since it operateseffectively in the growing of areas exhibiting a common color set (or“bucket” of color defined as the subtle variety of colors commonlyobserved as single road sign color as a result of changing viewingconditions) and it operates effectively as an progressively finerhysteresis filtering wherein the discontinuities become less readilyapparent. For example, a red sign creates a relatively sharpdiscontinuity relative to almost all background colors. Once identifiedas an image portion of interest, and removing said image portion, laterfull image frame processing for other discontinuities will likely needto accurately discern between shades of white and blue, yellow, orgreen. In these cases, the technique just described greatly enhances theability to rapidly extract a variety of signs present in even a singleimage frame using just the inventive hysteresis filter.

[0046] Two sets of data, edge data and the color data are fed to aninput node of a preferably three layer neural network which adds anentry to a 3D structure based on the location of a portion of the framebuffer 44 presently being processed. In effect, the 2D image containedin any given frame buffer is processed and compared to other framebuffers to create 3D regions of interest (ROI). In this context, the ROIrefers to a fabricated space which contains a length of video so that anumber of possible objects due to a either color, edge features,location to other possible objects, etc. Another way to consider the ROIis as a volumetric entity that has position and size both specified in a3D space. This ROI is used as a search query into the set of all images.They are searched based on inclusion in a predefined ROI. This databaseincludes all the “images” and so this searching occurs after theprocessing of all the data (i.e., extracting and filtering of a set orsegment of image frames). This data may have been collected at differenttimes including different seasonal conditions. The intersection of thesets of signs present will be identified as signs and can be identifiedwith special processing appropriate for such signs (e.g., winter parkingsigns, temporary construction signs, detour signs, etc.). Regardless, ofthe number or types of classes for the signs, the database is stored asa octree tree or any comparable searchable 3D memory structure.

[0047] During operation of the present invention all detected images ofsigns are assigned to an “image list” and by sequentially attempting tomatch “closely separated” pairs of images in an octree space of commonclassification, a “sign list” is generated. Once two or more members ofthe image list are matched, or “confirmed” as a single actual sign, eachimage is removed from further searching/pairing techniques. Adynamically-sized region of interest (ROI) which can be interpreted as avoxel, or volume pixel, populated by several images for each actual signis used to organize the image list into a searchable space that“advances” down the original recorded vehicle roadway as transformed tomany discrete images of the actual signs. Thus, the ROI is continuallyadvanced forward within the relative reference frame of the vehicle andafter each pair is correlated to a single sign, their correspondingrecords in the image list are removed. During this process, where asingle orphan image (non-confirmed, possible sign) appears it is culledto an orphan list which is then subjected to a larger search space thanthe first ROI to try to find a correlation of the single image toanother corresponding image and/or ported to a human user forinterpretation. This may result in the image being merged into a signusing relaxed matching constraints because it is known from the absoluteposition of the sign and the known arc of possible positions and the useof simple depth sorting that can “prove” they are the same sign. Thiscan be done even when the intersection of the sets of shared spatialfeatures is empty. At this point the GPS or location database can beconsulted to further aid identification. Manual review of a “best”selected and saved bitmap image of the unidentified object furtherenhances the likelihood of accurate identification and classification ofthe image object and presently the inventive system saves every imagebut culls all but the eight (8) or so having the highest magnitudesignal from the initial filter sets.

[0048] Preferably, there are three (3) basic filters used to recognize aportion of an image frame as a sign which deserves to have membership inthe “image list.” Edge intersection criteria are applied albeit relaxed(the edges are transformed into “lines of best fit” in Hough space byusing adaptive sizing, or “buckets,”) so that valid edge intersectionsexhibiting “sign-ness” are found; color-set membership; and neural netspatial characteristics. As noted above, the Fourier transformrecognition techniques suffer from a reliance on the frequency domainwhere many background objects and non-objects exhibit sign-ness asopposed to the spatial domain used beneficially herein where suchpotential errors (or false positives) are encountered. Using acompressed histogram of the color of the face of a sign allows in ahighly compressed bitmap file and if a boundary edge of the sign isreduced so that only a common shade (or color) is present thecompression of the image frame portion can be very efficient. Theinventors observe that even very small (1-2 pixels) spots of detectablecolor can be used for relatively long range confirmation of objectcolor.

[0049] The inventors suggest that up to thirty to forty (30-40) imagesper sign are often available and adequate to scrutinize but at a minimumonly one (1) reasonable depiction of an actual sign is required toperform the present inventive technique (if object size and cameralocation are known) and only approximately three (3) images are neededto provide extremely high identification accuracy rates. In a generalembodiment, the present invention is configured as a graphic-basedsearch engine that can scrutinize an extremely large number of frames ofimage data to log just a desired single object recognition event.

[0050] To reiterate the coined term “sign-ness” it is used herein todescribe those differentiable characteristics of signs versuscharacteristics of the vast majority of other things depicted in animage frame that are used to recognize signs without use of referencetargets, templates, or known image capture conditions. Thus, a generalembodiment of the present invention is herein expressly covered by thedisclosure herein in which the presence of any object of interest, orportion of such an object, can be discretely recognized provided saidobject of interest comprises a discrete set of differentiable qualitiesin comparison to other elements of a scene of interest. To paraphrase,each image frame is discarded if it exhibits little or no “sign-ness”because the image frame either does not hold an image of a sign orinsufficient detail of a sign to be useful. Stated a different way, thepresent invention uses partial function weight analysis techniques todiscard useless frames (e.g., frames without a sufficient amount of adifferentiable color, edge definition, or other differentiable featureof a desired object) and/or a relaxed confidence interval that stronglyweights approximate minimum basis function elements known to produce acorrelation to a real world object.

[0051] The concept of further classification of identified objects caninclude capture and analysis of text and other indicia printed on anobject by using suitable normalization routines or extractors andspecifically include well known OCR and template-based matchingtechniques. These routines and extractor engines allow for size,position, and rotational variances of said indicia. Thus, for example,this allows classification of objects to a much more detailed level. Inthe sign-finding embodiment, this means that detailed information can becaptured and compared. This allows sorting or searching for allinstances where the phrase “Nicollet Avenue” appears, where the phraseappears on corner street signs versus directional signs, or wherein allsigns identified and located on a street named Nicollet Avenue can berapidly retrieved, displayed, and/or conveyed.

[0052] The inventors have produced embodiments of the present inventionusing relatively cheap (in terms of processing overhead) functions inorder to rapidly and efficiently process the video data stream. Initialscreen may be done on scaled down version of the frame buffer. Laterfilter may be run on the full size data or even super sampled versionsof the full size data. Thus, certain functions applied to the video datastream quickly and easily indicate that one or more image frames shouldbe discarded without further processing or inspection and their use ispromoted as an expedient given the present state and cost of processingpower. For example, if only standard stop signs need to be recognizedand their position logged, shape is a key distinguishing, dispositivefeature and a search function based solely on shape will adequatelyrecognize a stop sign even if the video data stream depicts only theunpainted rear of the stop sign.

[0053] The neural network preferably used in conjunction with thepresent invention is a three layer feed forward neural network having asingle input layer, hidden layer, and an output layer. The backpropagation data for training the network typically utilize randomweights for the initial training sets applied to assist the neuralnetwork learning the characteristics of the set of objects to beidentified and the training sets preferably consist of sets with andwithout objects depicted therein, real-world sets, and worst-case sets.Those nodes of the neural network used to encode important spatialfeatures will vary proportionally to the input resolution of the framebuffer 44 and is dynamically reconfigurable to any resolution. Theneural network needs to learn size invariance, which is typically atough problem for neural networks, and thus the training sets assist theneural network in distinguishing a “little” from a “big” object andmatching them based on shape (the object seems to grow in the framebuffer as it nears the image acquisition apparatus). Size variation isfurther controlled by cutting off recognition of small (less than 5% offrame) images and also by using a unique neural network for each camera.Camera orientation and focus produce remarkably similar size viewsparticularly on side-facing cameras because of their approximateorthogonal orientation to the direction of travel and the signscloseness to the road on which the vehicle is traveling. The neuralnetwork preferably uses what are known as convex sets (which exhibit theability to distinguish between information sets given only a single (ora most a few) select criteria. In the preferred embodiment, shape andcolor, color edges, color differences, corners, ellipsicity, etc. of theimages identified as potential objects are used to create thisdifferentiability among signs. As earlier noted, when more than oneimage acquisition means 10 are used for a single scene of interest, eachimage acquisition means 10 needs to have a separate neural networktrained on the types of image frames produced by each image acquisitionmeans.

[0054] Hexagonal, rectangular, and diamond shapes are preferably encodedin the training sets for the neural network so that an n-feature objectmay be recognized without any direct relationship to only color, shape,and/or edge rotation.

[0055] The principles of “morphology” are preferably applied to dilateand erode a detected sign portion to confirm that the object has anacceptable aspect ratio (circularity or ellipsivity—depending on thenumber of sides) which is another differentiable characteristic of roadsign used to confirm recognition events. These can be described as “edgechain” following where edge descriptors are listed and connected andextended in attempts to complete edges that correspond to an actual edgedepicted in a frame. Morphology is thus used to get the “basic shape” ofan object to be classified even if there are some intervening coloredpixels that do not conform to a preselected color-set for a given classor type of sign. In the preferred embodiment, a color data set can beginas a single pixel of a recognizable color belonging to the subset ofacceptable road sign colors and the morphology principles are used todetermine shape based on at least a four (4) pixel height and an ten(10) pixel width. The frame, or border stripe of most signs, has todecompose to the orientation transformation of the small templar (i.e.,they must share a common large-size shape in a later frame and mustdecompose to a common small-size templar feature—typically at a viewinghorizon).

[0056] Furthermore, texture “segmentation” as known in the art, can beapplied to an image, particularly if one or more line and/or edgefilters fail to supply a an output value of significant magnitude. Onefeature of texture segmentation is that one very large feature of manyimage frames, the road itself, buildings, walls, and the sky alldisappear, or fail to record a meaningful output, under most texturesegmentation routines.

[0057] Referring now to FIGS. 2A, 2B, and 2C which depict a portion of aimage frame wherein parts of the edges of a potential object areobscured (in ghost), or otherwise unavailable, in an image frame (2A),and the same image frame portion undergoing edge extraction and linecompletion (2B), and the final enhanced features of the potential object(2C).

[0058] Referring now to FIG. 3A and FIG. 3B which each depicts apropelled image acquisition vehicle 46 conveying imaging systems10,20,30,40 each preferably comprises of unique cameras tuned tooptimally record road signs and other featured objects adjacent avehicle right-of-way. While two cameras are perceived as the best by theinventors the present invention operates adequately with several cameraseach covering at least those objects on each side of the road, above theroad surface, on the surface of the road, and a rearward view of therecording path. In alternative embodiments the inventors envision atleast two cameras oriented on a vehicle traveling down a railroad rightof way in which the processing techniques are trained to recognize thediscrete objects of interest that populate the railroad bed, railwayintersections, roadway crossings, and adjacent properties withoutdeparting from the spirit and strength of the present invention.

[0059] Referring now to FIG. 5 which is a view depicting a preferredembodiment of the present invention wherein the four imaging devices10,20,30,40 are combined into a single road sign detection system.

[0060] In summary, in the exemplary road sign identification embodiment,a videostream containing a series of signs in one or more frames issubjected to processing equipment that rapidly applies extractionroutines to quickly cull the typically high number of useless imagesfrom the useful images. Fortunately, road signs benefit from a simpleset of rules regarding the location of signs relative to vehicles on theroadway (left, right, above, and a very limited set of painted-on-roadsigns and markings), the color of signs (preferably limited to discretecolor-sets), the physical size and shape of signs, even the font used ontext placed upon signs, indicia color, indicia shape, indicia size, andindicia content, the orientation of the signs (upright and facingoncoming traffic), and the sequence in which the variety of signs aretypically encountered by the average vehicle operator. Because of theintended usage of these signs for safety of vehicles these standards arerigidly followed and furthermore these rules of sign color and placementadjacent vehicle rights of way do not vary much from jurisdiction tojurisdiction and therefore the present invention may be used quickly fora large number of different jurisdictions. Furthermore, pedestrian,cycle, and RV path signage identification may likewise benefit from thepresent invention. Although the border framing the road sign has beendescribed as one of the most easily recognized features of road signs(and in many cases is dispositive of the issue of whether or not a signis present in an image frame) the present system operates effectivelyupon road signs that do not have such a border. If a sign is reclinedfrom normal, only a portion of the border frame is needed to ascertainwhether the image portion is a portion of a road sign by creating anormalized representation of the sign (typically just the top edge).Another such technique applies Bayesian techniques that exploits thefact that the probability of two events occurring at the intersection ofthe two possibilities. Other techniques are surely known to those ofskill in the art.

[0061] Referring to FIG. 6, an optimum image gathering vehicle isdepicted having at least two image capture devices directed toward thedirection of travel of said vehicle.

[0062] Referring to FIGS. 7A-F are views of the outlines of a variety ofcommon standard U.S. road signs.

[0063] Hardware platforms preferred by the inventors include processorshaving MMX capability (or equivalent) although others can be used inpracticing the present invention. One of skill in the art willappreciate that the present apparatus and methods can be used with otherfilters that are logically OR'd together to rapidly determine“object-ness” of a variety of objects of interest. The differentiablecriteria used in conjunction with the present invention can vary withthe characteristics of the objects of interest. For road signs, theinventors teach, disclose, and enable use of discrete color-sets oredges (extracted and/or extended to create a property best described as“rectangularity”) or orientation of a sign to the roadway for only oneview of the roadside from a single recording device or texture torapidly discern which image frames deserve further processing. A neteffect of this hierarchical strategy is the extremely rapid pace atwhich image frames that do not immediately create an output signal fromone of the filters of the filter set are discarded so that processingpower is applied only to the image frames most likely to contain anobject of interest. The inventors suggest that the inventive methodherein taught will propel the technology taught, enabled, and claimedherein to become widely available to the public. Thereafter, myriadvaluable implementations of the technology presented herein shall becomeapparent. Other embodiments of the present invention included are easilyrealized following exposure to the teaching herein and each is expresslyintended to be covered hereby.

[0064] Further, those embodiments specifically described and illustratedherein are merely just that, embodiments of the invention hereindescribed, depicted, enabled and claimed, and should not be used tounduly restrict the scope or breadth of coverage of each patent issuinghereon. Likewise, as noted earlier, the invention taught herein can beapplied in many ways to identify and log specific types of objects thatpopulate a scene of interest to assist in vehicle navigation, physicalmapping/logging status by object location and type, and identifying,linear man-made materials present in a scene generally populated bynatural materials.

EXAMPLE 1

[0065] A method of recognizing and determining the location of at leastone of a variety of road signs from at least two image frames depictingat least one road sign wherein available known values regarding thelocation, orientation, and focal length of an image capture device whichoriginally recorded the at least two image frames, comprising the stepsof:

[0066] receiving at least two image frames that each depict at least asingle common road sign and which correspond to an identifier tagincluding at least a one of the following items: camera number, framenumber, camera location coordinates, or camera orientation;

[0067] applying a fuzzy logic color filter to said at least two imageframes;

[0068] filtering out and saving image frame portions containing eachregion that contain at least one preselected color-pair of a pair-set ofapproved road sign colors; and

[0069] saving to a memory location said image frame portions of the atleast a single common road sign depicted in one of said at least twoimage frames which is linked to at least a one of the following items: acamera number, an image frame number, a set of camera locationcoordinates, or a camera orientation direction used for recording.

EXAMPLE 2

[0070] An method for recognizing an object and classifying it by type,location, and visual condition from a digitized video segment of imageframes comprising the steps of:

[0071] applying two filters to an image frame wherein the two filterseach capture at least one differentiable characteristic of the object ofinterest;

[0072] extracting a first data set and a second data set from said twofilters;

[0073] comparing said first data set and said second data set tothreshold values;

[0074] discarding said image frame if the first or second data set donot exceed the threshold and

[0075] adding said image frame to an image frame library of possibleimages depicting actual objects.

EXAMPLE 3

[0076] A method for identifying similar objects depicted in at least twobitmap frame buffers of a digital processor, comprising the steps of:

[0077] receiving a digital image frame that corresponds to a uniquecamera, a camera location, an image frame reference value;

[0078] applying a set of equally weighted filters to said image framewherein each of said equally weighted filters each creates an outputsignal adjusted to reflect the magnitude of a different differentiablecharacteristic of an object of interest;

[0079] OR-ing the resulting output signals from each of the equallyweighted filters and

[0080] saving only those image frames in which at least one of theequally weighted filters produces the output signal having a localmaximum value.

EXAMPLE 4

[0081] A method of identifying traffic control signs adjacent a vehicleright of way, comprising the steps of:

[0082] receiving a digital videostream composed of individual imageframes depicting a roadway as viewed from a vehicle traversing saidroadway;

[0083] iteratively comparing bitmap frames of said videostream todetermine if a first bitmap pixel set matches a second bitmap pixel setin terms of reflectance, color, or shape of an object depicted therein;

[0084] placing all members of the first pixel set and the second pixelset that match each other in an identified field of a databasestructure;

[0085] synchronizing a geo-positioning signal to the identified field;and

[0086] storing a representative bitmap image of either the first pixelset or the second pixel set in conjunction with the geo-positioningsignal.

EXAMPLE 5

[0087] A method of rapidly recognizing road signs depicted in at leastone frame of a digital videosignal, comprising the steps of:

[0088] applying at least two equally weighted filters to at least oneframe of a digital depiction of a road side scene so that for each ofthe at least two equally weighted filters a discrete output value isobtained;

[0089] comparing the discrete output value for each respective said atleast two equally weighted filters and if a discrete output of at leastone of said at least two equally weighted filters does not exceed areference value then discarding the at least one frame of digitalvideosignal, but if one said discrete output exceeds a reference value;and then

[0090] setting a road sign “image present” flag for said at least oneframe of a digital videosignal;

[0091] further comprising the steps of

[0092] saving a bitmap image of a portion of said at least one frame ofdigital videosignal recording a location data metric corresponding tothe location of the camera which originally recorded the at least oneframe of digital videosignal; and

[0093] wherein the location data metric further comprises the directionthe camera was facing while recording, the focal length of the camera,and the location of the camera as recorded by at least one globalpositioning device.

[0094] Although that present invention has been described with referenceto discrete embodiments, no such limitation is to be read into theclaims as they alone define the metes and bounds of the inventiondisclosed and enabled herein. One of skill in the art will recognizecertain insubstantial modifications, minor substitutions, and slightalterations of the apparatus and method claimed herein, that nonethelessembody the spirit and essence of the claimed invention without departingfrom the scope of the following claims.

What is claimed is:
 1. A method of recognizing and determining the location of at least one of a variety of road signs from at least two image frames depicting at least one road sign wherein available known values regarding the location, orientation, and focal length of an image capture device which originally recorded the at least two image frames, comprising the steps of: receiving at least two image frames that each depict at least a single common road sign and which correspond to an identifier tag including at least a one of the following items: camera number, frame number, camera location coordinates, or camera orientation; applying a fuzzy logic color filter to said at least two image frames; filtering out and saving image frame portions containing each region that contain at least one preselected color-set from a set of at least one approved road sign color; and saving to a memory location said image frame portions of the at least a single common road sign depicted in one of said at least two image frames which is linked to at least a one of the following items: a camera number, an image frame number, a set of camera location coordinates, or a camera orientation direction used for recording.
 2. The method of claim 1 , and prior to completing the step of applying the fuzzy logic color filter, practicing the step of converting said at least two image frames from a native color space to a single color space portion of a L*u*v* color space and wherein the fuzzy logic color filter provides maximum value output signals for only a set of preselected color.
 3. The method of claim 2 , wherein the value output signals are determined by location in said L*u*v* color space and wherein the value output signals are assigned to a minimal set of mathematically described colors representing all the legal color names and combinations.
 4. A method of rapidly recognizing road signs depicted in at least one frame of a digital videosignal, comprising the steps of: applying at least two equally weighted filters to at least one frame of a digital depiction of a road side scene so that for each of the at least two equally weighted filters a discrete output value is obtained; comparing the discrete output value for each respective said at least two equally weighted filters and if a discrete output of at least one of said at least two equally weighted filters does not exceed a reference value then discarding the at least one frame of digital videosignal, but if one said discrete output exceeds a reference value; and then setting a road sign “image present” flag for said at least one frame of a digital videosignal.
 5. The method of claim 4 , further comprising the step of saving a bitmap image of a portion of said at least one frame of digital videosignal.
 6. The method of claim 4 , further comprising the step of recording a location data metric corresponding to the location of the camera which originally recorded the at least one frame of digital videosignal.
 7. The method of claim 6 , wherein the location data metric further comprises the direction the camera was facing while recording, the focal length of the camera, and the location of the camera as recorded by at least one global positioning device.
 8. The method of claim 4 , further comprising the steps of: applying another filter which differentiates between various types of road signs; classifying as many of the images as possible by road sign type; and creating a record in a database corresponding to the type of road sign (if known), an approximate location of the road sign, the direction the road sign faces, and at least a portion of a bitmap of the at last one frame of digital videosignal containing said road sign.
 9. The method of claim 1 , wherein the at least two filters is selected from a set of the following filters: an edge filter, a color-pair filter, a color filter operating in the L*u*v* color space, an edge filter combined with a line extender, or a color filter operating in the LCH color space.
 10. A method of recognizing a single road sign depicted in a least two different frames of a digital videosignal, comprising the steps of: searching by pairs, each location metric for each of a plurality of previously identified images of road signs having a common type, so that when at least two images appear to depict a single road sign, all of said at least two images are removed from the search space, until no more pairs are available for continued searching, and then proceeding to the first step for a next type of road sign until no further types of road signs are available for searching
 11. The method of claim 10 , further comprising the step of recording all said pairs into an auxiliary data structure.
 12. The method of claim 11 , wherein if no other images of a road sign corresponds to any other images of a road sign of same type, then forwarding the location metric to either a human operator or a storage medium for review.
 13. The method of claim 4 , further comprising at least one more frame of digital videostream and wherein the at least one more frame of digital videostream was recorded with at least one additional camera and wherein the at least two equally weighted filters are each customized for each said at least one additional camera so that each said filter accounts for different focal lengths, illumination effects, or recording direction of each of said at one additional camera.
 14. The method of claim 1 , wherein in lieu of the fuzzy logic color set filter a neural network is applied to the at least two image frames.
 15. The method of claim 1 , wherein the color-set is a single color.
 16. The method of claim 4 , wherein the at least two equally weighted filters are selected from a set of the following filters: an edge filter, a color-pair filter, a color filter operating in the L*u*v* color space, an edge filter combined with a line extender, or a color filter operating in the LCH color space.
 17. The method of claim 16 , further comprising the steps of: growing at least two edges that were filtered until they intersect; calculating an angle of convergence for said at least two edges; comparing the angle of convergence to a range of acceptable angles of convergence for a corner surface of at least one each of a class of objects of interest; and saving a record for each image from which the at least two edges were derived only if the calculated angle is within the range of acceptable angles.
 18. The method of claim 4 , further comprising the step of setting an flag to “multiple signs present” if more than one sign is detected in a given image frame.
 19. The method of claim 4 , further comprising the step of activating a unique visible symbol, an audible signal or tone, or a vibratory signal each time an image is detected.
 20. The method of claim 19 , wherein the symbol, signal or tone, or vibratory signal corresponds to a unique type of image. 